654 research outputs found
Global Thresholding and Multiple Pass Parsing
We present a variation on classic beam thresholding techniques that is up to
an order of magnitude faster than the traditional method, at the same
performance level. We also present a new thresholding technique, global
thresholding, which, combined with the new beam thresholding, gives an
additional factor of two improvement, and a novel technique, multiple pass
parsing, that can be combined with the others to yield yet another 50%
improvement. We use a new search algorithm to simultaneously optimize the
thresholding parameters of the various algorithms.Comment: Fixed latex errors; fixed minor errors in published versio
Efficient Algorithms for Parsing the DOP Model
Excellent results have been reported for Data-Oriented Parsing (DOP) of
natural language texts (Bod, 1993). Unfortunately, existing algorithms are both
computationally intensive and difficult to implement. Previous algorithms are
expensive due to two factors: the exponential number of rules that must be
generated and the use of a Monte Carlo parsing algorithm. In this paper we
solve the first problem by a novel reduction of the DOP model to a small,
equivalent probabilistic context-free grammar. We solve the second problem by a
novel deterministic parsing strategy that maximizes the expected number of
correct constituents, rather than the probability of a correct parse tree.
Using the optimizations, experiments yield a 97% crossing brackets rate and 88%
zero crossing brackets rate. This differs significantly from the results
reported by Bod, and is comparable to results from a duplication of Pereira and
Schabes's (1992) experiment on the same data. We show that Bod's results are at
least partially due to an extremely fortuitous choice of test data, and
partially due to using cleaner data than other researchers.Comment: 10 page
Recommended from our members
Skills, Schools, and Credit Constraints
Low college enrollment rates among low income students may stem from credit constraints, low academic skill, low quality schools, or some combination of these. Recent Massachusetts data allow the first use of school district fixed effects in the analysis of credit constraints, leading to four primary findings. First, Massachusetts' low income students have lower intended college enrollment rates than higher income students but also have dramatically lower skills and attend lower quality school districts. Second, inclusion of skill controls greatly reduces but does not eliminate the intended enrollment gap, with low income students seven percentage points less likely to intend enrollment than similarly skilled higher income students. Third, in districts where higher income students are plausibly unconstrained, inclusion of school district fixed effects does little to reduce intended enrollment gaps, with low income students nine percentage points less likely to intend enrollment than similarly skilled higher income students from the same school district. Fourth, low income students in the middle and upper parts of the skill distribution appear the most constrained, particularly with respect to four-year public colleges. State governments could use the methods employed here to identify credit constrained student populations in order to target financial aid more efficiently
Recommended from our members
Skills, Schools, and Credit Constraints
Low college enrollment rates among low income students may stem from credit constraints, low academic skill, low quality schools, or some combination of these. Recent Massachusetts data allow the first use of school district fixed effects in the analysis of credit constraints, leading to four primary findings. First, Massachusetts' low income students have lower intended college enrollment rates than higher income students but also have dramatically lower skills and attend lower quality school districts. Second, inclusion of skill controls greatly reduces but does not eliminate the intended enrollment gap, with low income students seven percentage points less likely to intend enrollment than similarly skilled higher income students. Third, in districts where higher income students are plausibly unconstrained, inclusion of school district fixed effects does little to reduce intended enrollment gaps, with low income students nine percentage points less likely to intend enrollment than similarly skilled higher income students from the same school district. Fourth, low income students in the middle and upper parts of the skill distribution appear the most constrained, particularly with respect to four-year public colleges. State governments could use the methods employed here to identify credit constrained student populations in order to target financial aid more efficiently
Parsing Inside-Out
The inside-outside probabilities are typically used for reestimating
Probabilistic Context Free Grammars (PCFGs), just as the forward-backward
probabilities are typically used for reestimating HMMs. I show several novel
uses, including improving parser accuracy by matching parsing algorithms to
evaluation criteria; speeding up DOP parsing by 500 times; and 30 times faster
PCFG thresholding at a given accuracy level. I also give an elegant,
state-of-the-art grammar formalism, which can be used to compute inside-outside
probabilities; and a parser description formalism, which makes it easy to
derive inside-outside formulas and many others.Comment: Ph.D. Thesis, 257 pages, 40 postscript figure
Recommended from our members
The Wages of Sinistrality: Handedness, Brain Structure and Human Capital Accumulation
Left- and right-handed individuals have different brain structures, particularly in relation to language processing. Using five data sets from the US and UK, I show that poor infant health increases the likelihood of a child being left-handed. I argue that handedness can thus be used to explore the long-run impacts of differential brain structure generated in part by poor infant health. Even conditional on infant health and family background, lefties exhibit economically and statistically significant human capital deficits relative to righties. Compared to righties, lefties score a tenth of a standard deviation lower on measures of cognitive skill and, contrary to popular wisdom, are not over-represented at the high end of the distribution. Lefties have more emotional and behavioral problems, have more learning disabilities such as dyslexia, complete less schooling, and work in less cognitively intensive occupations. Differences between left- and right-handed siblings are similar in magnitude. Most strikingly, lefties have six percent lower annual earnings than righties, a gap that can largely be explained by these differences in cognitive skill, disabilities, schooling and occupational choice. Lefties work in more manually intensive occupations than do righties, further suggesting that lefties’ primary labor market disadvantage is cognitive rather than physical. Those likely be left-handed due to genetics show smaller or no deficits relative to righties, suggesting the importance of environmental shocks as the source of disadvantage. Handedness provides parents and schools a costlessly observable characteristic with which to identify young children whose cognitive and behavioral development may warrant additional attention
Recommended from our members
Gold Standards?: State Standards Reform and Student Achievement
Proponents of the recent and widely adopted Common Core State Standards argue that high quality curricular standards are critical to students’ educational success. Little clear evidence exists, however, linking the quality of such standards to student achievement. I remedy this by connecting data on state-level student achievement from 1994-2011 with measures of the quality of states’ curricular standards as judged by two independent organizations at three different moments in time. I show that, within states, changes in the quality of standards have little impact on overall student achievement. Improved standards do, however, raise achievement of 8th graders in low-scoring states, particularly for low-scoring students. Given the known weaknesses of U.S. middle schools, this result suggests that standards may be beneficial in settings where pedagogy would otherwise be poor
- …